Goto

Collaborating Authors

 sample splitting


Debiased Machine Learning without Sample-Splitting for Stable Estimators

Neural Information Processing Systems

Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-n consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.





Adaptive Off-Policy Inference for M-Estimators Under Model Misspecification

arXiv.org Machine Learning

When data are collected adaptively, such as in bandit algorithms, classical statistical approaches such as ordinary least squares and $M$-estimation will often fail to achieve asymptotic normality. Although recent lines of work have modified the classical approaches to ensure valid inference on adaptively collected data, most of these works assume that the model is correctly specified. We propose a method that provides valid inference for M-estimators that use adaptively collected bandit data with a (possibly) misspecified working model. A key ingredient in our approach is the use of flexible machine learning approaches to stabilize the variance induced by adaptive data collection. A major novelty is that our procedure enables the construction of valid confidence sets even in settings where treatment policies are unstable and non-converging, such as when there is no unique optimal arm and standard bandit algorithms are used. Empirical results on semi-synthetic datasets constructed from the Osteoarthritis Initiative demonstrate that the method maintains type I error control, while existing methods for inference in adaptive settings do not cover in the misspecified case.


Test of partial effects for Frechet regression on Bures-Wasserstein manifolds

arXiv.org Machine Learning

In many modern applications, positive definite matrices are often used to summarize the marginal covariance structure among sets of variables. Examples include medical imaging (Dryden et al., 2009; Fillard et al., 2007), neuroscience (Friston, 2011; Kong et al., 2020; Hu et al., 2021) and gene coexpression analysis in single cell genomics. A central challenge in these fields is how to perform regression analysis where the covariance matrix serves as the outcome variable in relation to a set of Euclidean covariates and how to test for the association between these matrix and covariates. Several regression approaches for covariance matrix outcomes have been proposed. Chiu et al. (1996) developed a method that models the elements of the logarithm of the covariance matrix as a linear function of the covariates, but this approach requires estimating a large number of parameters. Hoff & Niu (2012) proposed a regression model where the covariance matrix is expressed as a quadratic function of the explanatory variables. Zou et al. (2017) linked the matrix outcome to a linear combination of similarity matrices derived from the covariates and examined the asymptotic properties of different estimators under this framework. Xu & Li (2025) introduced Fr echet regression with covariate matrix as the outcome.


Thinning a Wishart Random Matrix

arXiv.org Machine Learning

Recent work has explored data thinning, a generalization of sample splitting that involves decomposing a (possibly matrix-valued) random variable into independent components. In the special case of a $n \times p$ random matrix with independent and identically distributed $N_p(\mu, \Sigma)$ rows, Dharamshi et al. (2024a) provides a comprehensive analysis of the settings in which thinning is or is not possible: briefly, if $\Sigma$ is unknown, then one can thin provided that $n>1$. However, in some situations a data analyst may not have direct access to the data itself. For example, to preserve individuals' privacy, a data bank may provide only summary statistics such as the sample mean and sample covariance matrix. While the sample mean follows a Gaussian distribution, the sample covariance follows (up to scaling) a Wishart distribution, for which no thinning strategies have yet been proposed. In this note, we fill this gap: we show that it is possible to generate two independent data matrices with independent $N_p(\mu, \Sigma)$ rows, based only on the sample mean and sample covariance matrix. These independent data matrices can either be used directly within a train-test paradigm, or can be used to derive independent summary statistics. Furthermore, they can be recombined to yield the original sample mean and sample covariance.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

We are very grateful to the careful and throughout reviews from all reviewers. Reviewer_1 The reviewer comments that our novel results are both practically and theoretically important, and also suggests that we provide more discussion and examples on our technical conditions such as the decomposability condition, subspace compatibility constant etc. We have done so, though due to space limitations we have pushed examples illustrating our conditions to the supplementary material, in the section on proof ingredients for the results in section 5. In detail, for each model, we check and illustrate why those aforementioned technical conditions hold in every subsection of Section B. We will try to illustrate these comments in the main paper. Reviewer_2 The reviewer suggests giving more illustration about why the existing theoretical framework for regularization does not work for inference with latent variables. Briefly, the reason is as follows.


Statistical Inference for Low-Rank Tensor Models

arXiv.org Machine Learning

Statistical inference for tensors has emerged as a critical challenge in analyzing high-dimensional data in modern data science. This paper introduces a unified framework for inferring general and low-Tucker-rank linear functionals of low-Tucker-rank signal tensors for several low-rank tensor models. Our methodology tackles two primary goals: achieving asymptotic normality and constructing minimax-optimal confidence intervals. By leveraging a debiasing strategy and projecting onto the tangent space of the low-Tucker-rank manifold, we enable inference for general and structured linear functionals, extending far beyond the scope of traditional entrywise inference. Specifically, in the low-Tucker-rank tensor regression or PCA model, we establish the computational and statistical efficiency of our approach, achieving near-optimal sample size requirements (in regression model) and signal-to-noise ratio (SNR) conditions (in PCA model) for general linear functionals without requiring sparsity in the loading tensor. Our framework also attains both computationally and statistically optimal sample size and SNR thresholds for low-Tucker-rank linear functionals. Numerical experiments validate our theoretical results, showcasing the framework's utility in diverse applications. This work addresses significant methodological gaps in statistical inference, advancing tensor analysis for complex and high-dimensional data environments.


Double Machine Learning for Adaptive Causal Representation in High-Dimensional Data

arXiv.org Machine Learning

Adaptive causal representation learning from observational data is presented, integrated with an efficient sample splitting technique within the semiparametric estimating equation framework. The support points sample splitting (SPSS), a subsampling method based on energy distance, is employed for efficient double machine learning (DML) in causal inference. The support points are selected and split as optimal representative points of the full raw data in a random sample, in contrast to the traditional random splitting, and providing an optimal sub-representation of the underlying data generating distribution. They offer the best representation of a full big dataset, whereas the unit structural information of the underlying distribution via the traditional random data splitting is most likely not preserved. Three machine learning estimators were adopted for causal inference, support vector machine (SVM), deep learning (DL), and a hybrid super learner (SL) with deep learning (SDL), using SPSS. A comparative study is conducted between the proposed SVM, DL, and SDL representations using SPSS, and the benchmark results from Chernozhukov et al. (2018), which employed random forest, neural network, and regression trees with a random k-fold cross-fitting technique on the 401(k)-pension plan real data. The simulations show that DL with SPSS and the hybrid methods of DL and SL with SPSS outperform SVM with SPSS in terms of computational efficiency and the estimation quality, respectively.